To  appear  in  Oournal  of  the  Optical  Society  of  America  A 


Adaptive  Bayesian  Signal  Reconstruction 
with  A  Priori  Model  Implementation  and 
Synthetic  Examples  for  X-ray 
Crystallography 

Peter  C.  Doerschuk 
School  of  Electrical  Engineering 
Purdue  University 
West  Lafayette,  Indiana  47907-1285 
(317)  494-1742 
February  26,  1991 


Abstract 

In  [1]  a  signal  reconstruction  problem  motivated  by  X-ray  crystallography  is  (ap¬ 
proximately)  solved  in  a  Bayesian  statistical  approach.  The  signal  is  zero-one,  peri¬ 
odic,  and  substantial  statistical  a  priori  information  is  known,  which  is  modeled  with 
a  Markov  random  field.  The  data  are  inaccurate  magnitudes  of  the  Fourier  coefficients 
of  the  signal.  The  solution  is  explicit  and  the  computational  burden  is  independent  of 
the  signal  dimension.  In  this  paper  a  detailed  parameterization  of  the  a  priori  model 
appropriate  for  crystallography  is  proposed  and  symmetry-breaking  parameters  in  the 
solution  are  used  to  perform  data-dependent  adaptation  of  the  estimator.  The  adap- 
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tation  attempts  to  minimize  the  effects  of  the  spherical  model  approximation  used  in 
the  solution.  Several  examples  in  one  and  two  dimensions  based  on  simulated  data  are 
presented. 
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1  Introduction 


In  [1]  a  novel  Bayesian  statistical  approach  is  presented  to  a  class  of  phase-retrieval  problems 
exemplified  by  the  inverse  problem  of  single-crystal  X-ray  crystallography.  In  this  paper 
parameters  are  proposed  in  the  a  priori  model  that  are  suitable  for  modeling  bond-length 
limitations  in  covalently-bound  molecules,  free  parameters  in  the  estimator  are  employed  in 
order  to  design  a  data-dependent  adaptive  estimator,  and  several  numerical  examples  in  one 
and  two  dimensions  based  on  simulated  data  are  provided. 

The  situation  from  [1]  is  recalled  in  this  and  the  following  paragraph.  There  is  a  periodic 
object  which  takes  only  values  zero  and  one,  and  the  observer  makes  corrupted  measurements, 
denoted  z/*,,  of  the  magnitude  of  its  Fourier  transform.  The  goal  is  to  reconstruct  the  object 
from  these  measurements  and  from  a  priori  probabilistic  information  concerning  the  class  of 
likely  objects.  One  period  of  the  object  is  modeled  as  a  binary-valued  finite-lattice  Markov 
random  field  (MRF)  denoted  (j)n  and  the  corruption  of  the  measurements  is  modeled  as 
independent  Gaussian  random  variables  with  ^-dependent  variances.  Both  the  MRF  and  the 
Gaussian  observation  errors  can  be  put  in  the  form  of  energy  functions  with  the  corresponding 
probabilities  in  the  form  of  the  Gibbs  distribution.  A  Bayesian  conditional  mean  estimation 
problem  is  approximately  solved.  In  order  to  compute  E(<j>n\{y })  one  computes  averages. 
However,  because  only  the  magnitude  of  the  Fourier  transform  is  available,  the  definition  of 
the  coordinate  system  on  the  object  is  lost.  For  example,  in  one  dimension,  all  information 
concerning  the  origin  in  the  sense  of  translation  and  concerning  the  handedness  in  the  sense  of 
inversion  through  the  origin  (n  — ►  — n)  is  lost.  If  one  blindly  averages  over  all  configurations 
of  { <f> },  the  result  is  a  DC  field.  Therefore,  an  additional  term  in  the  energy  function  is 
introduced  which  favors  certain  configurations.  For  example,  in  one  dimension,  the  additional 
term  breaks  the  symmetries  of  the  previous  energy  function  with  respect  to  translation  and 
inversion.  The  term  has  the  form  of  a  convolution  of  the  field  <pn  with  a  kernel  function  . 
The  ipn  are  the  free  parameters  alluded  to  above,  and  the  subject  of  this  paper  is  a  method 
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for  choosing  them  in  a  data-dependent  fashion. 

In  solving  the  estimation  problem  in  [1]  two  approximations  were  made.  The  first  ap¬ 
proximation  was  the  spherical  model  which  relaxed  the  0-1  nature  of  the  lattice  variables 
<j>n.  The  second  approximation  was  to  evaluate  integrals  asymptotically  as  the  observation 
noise  variance  approached  zero.  (Actually  two  different  asymptotic  problems  were  proposed 
and  solved  depending  on  whether  the  a  priori  energy  function  was  scaled  along  with  the 
observational  energy  function).  Given  these  two  approximations,  explicit  (e.g.,  no  numeri¬ 
cal  quadratures  or  nonlinear  optimizations)  formulae  were  computed  for  an  approximation, 
denoted  mn,  to  E(<f>n\{y})  as  a  function  of  0  and  y.  These  formulae  are  easy  to  compute, 
can  accommodate  missing  data  and  varying  observation  noise  variance,  and  are  essentially 
the  same  in  any  dimensional  space. 

For  any  choice  of  0  this  is  a  valid  Bayesian  estimation  problem.  Therefore,  the  goal  is  to 
choose  0  in  an  intelligent  fashion.  One  natural  approach,  which  is  the  approach  taken  here, 
is  to  try  to  choose  0,  or  equivalently,  the  estimation  problem,  to  minimize  the  effects  of  the 
two  approximations  made  in  the  solution  of  the  estimation  problem. 

In  this  paragraph  the  Hamiltonian  is  recalled  from  [1].  The  Hamiltonian  is  in  three  parts- 
the  a  priori  probability  part  Hapnon,  the  conditional  observational  probability  part  Hohs ,  and 
the  symmetry  breaking  part  Hsb\  That  is, 


H  =  fPpnon  +  Hohs  +  H 


s.b. 


Equations  are  stated  for  the  one-dimensional  case.  In  d  dimensions  exactly  the  same  equa¬ 
tions  hold  with  indices,  lattice  dimensions,  and  sums  all  expanded  to  d  dimensions.  The 
a  priori  part  is  the  most  general  shift-invariant  quadratic,  specifically, 


^■apriori 


L— 1  L— 1 

=  ^2  ]T  0„+„j  W2(ni,n2)<i>n+n2  +wl<Pn 

ni=0  7i2=0 

L-l 

=  n 

n=0 


(1) 


where  L  is  the  size  of  the  lattice  which  is  also  the  period  of  the  crystal  when  measured  in 
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lattice  spacings.  The  conditional  observational  part  is  Gaussian,  specifically, 


^  =  \®k\2 

n— 0 

H°h‘  =  zAfe-^w))2, 

Jk= 0 

where  y*,  and  cr^  are  observed  in  the  experiment.  However,  the  <7^  values  are  assumed  to 
be  exact.  Finally,  the  symmetry  breaking  part  is  a  convolution  with  the  kernel  function 
specifically, 

L- 1 

Hs  h ■  =  qL^2 

n=0  ^ 

where  ?/>„  is  real  and  periodic  with  period  L. 

In  this  paragraph  the  asymptotics  are  recalled  from  [1].  Two  different  asymptotic  limits 
are  considered.  The  first  limit,  denoted  Problem  1,  is  purely  a  small  observation  noise  limit. 
That  is,  the  observation  noise  variance  a\  satisfies  a\  —  y<r \  and  A  f  oo.  The  second 
limit,  denoted  Problem  2,  combines  the  small  observation  noise  limit  with  a  proportional 
scaling  of  the  a  priori  Hamiltonian.  That  is,  the  observation  noise  variance  a\  and  the 
parameters  W2(kx,  k2)  and  it>i  from  the  a  priori  portion  of  the  Hamiltonian  satisfy  a\  =  raf,, 
W2(ki,  k2)  —  AyTk^^i,  k2),  wx  =  Axuh,  A  f  oo,  and  \  is  a  fixed  real  number. 

The  remainder  of  this  paper  is  organized  in  the  following  fashion.  In  Section  2  a  parame¬ 
terization  of  the  a  priori  Hamiltonian  is  described  that  is  appropriate  for  X-ray  crystallogra¬ 
phy.  In  Section  3  an  optimization  problem  is  described  for  the  choice  of  Implementation 
issues  are  briefly  discussed  in  Section  4.  In  Section  5  one-dimensional  examples  are  pre¬ 
sented  which  emphasize  the  need  to  break  the  symmetries  discussed  above.  In  Section  6 
a  one-dimensional  example  is  presented  which  is  sufficiently  small  such  that  it  is  possible 
to  compute  estimator  performance  statistics  versus  observation  noise  intensity  for  the  exact 
conditional  mean  estimator,  the  approximations  to  the  conditional  mean  estimator  (with 
either  definition  of  the  asymptotics),  and  the  trivial  estimator  that  simply  takes  a  random 
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sample  from  the  a  priori  distribution  for  its  estimate.  A  two-dimensional  example  is  pre¬ 
sented  in  Section  7.  Finally,  in  Section  8  the  results  to  date  are  discussed  and  directions  for 
future  research  are  described. 

2  Use  of  iJapnon  in  X-Ray  Crystallography 

In  this  section  one  method  for  using  the  quadratic  a  priori  Hamiltonian  H& pnorl  to  model 
bond-length  limitations  in  covalently  bound  molecules  is  described.  By  interpreting  |  *  |  as 
absolute  value  in  one  dimension  or  as  Euclidean  norm  in  higher  dimensions,  the  following 
applies  for  arbitrary  dimension  of  the  signal. 

Define  a  minimum  (maximum)  bond  length  lx  (I2)  measured  in  lattice  spacings.  Assume 
that  there  is  an  atom  at  site  no.  Therefore,  <j>no  =  1.  In  order  to  avoid  the  presence  of  a 
neighboring  atom  at  less  than  the  minimum  bond  length,  the  a  priori  probability  function 
is  low  if  there  is  an  atom  in  the  region  {n||n  —  no|  <  /1}.  In  addition,  in  order  to  avoid 
unbound  atoms,  the  a  priori  probability  function  is  low  if  there  is  no  neighboring  atom  in 
the  region  {n|/i  <  \n  —  n0|  <  12}.  In  order  to  use  these  ideas  with  the  quadratic  /Ppnon 
defined  previously,  the  penalty  on  crowding  (i.e.,  an  atom  in  the  region  {n||n  —  n0|  <  /, }) 
must  become  progressively  greater  as  more  atoms  fall  in  the  region  and  the  penalty  on  being 
unbound  (i.e.,  {n|/i  <  \n  —  no|  <  h})  must  become  progressively  smaller  as  more  atoms  fall 
in  the  region.  (Note,  however,  that  as  more  atoms  are  present  in  this  region  they  in  turn 
will  become  crowded  with  respect  to  each  other,  thereby  lowering  the  a  priori  probability 
function  again).  The  penalties  described  here  are  associated  with  site  n0  and  therefore  wx, 
W2  should  be  chosen  so  that  the  penalties  appear  in  u„0  (eqn.  1).  Since  there  are  no  penalties 
that  depend  only  on  the  value  of  a  single  site,  wx  =  0.  Since  the  penalties  only  occur  when 
4>na  =  1,  it  would  be  convenient  to  choose  W2(nx,n2)  so  that  it  is  zero  unless  nx  =  0.  Since 
W2  is  symmetric,  define  an  asymmetric  uj2  whose  symmetric  part  is  W2.  The  definitions  are 
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u>2(ni,n2)  =  i(u>2(ni,n2)+  *h2(n2,ni))  where  u>2(n!,n2)  is  defined  by 

d>2(rai,n2)  =  0  «i  7^  0,  Vn2 

(Pi  if  1  <  K!  <  h 
ih2(0,n2)  =  <  p2  if  lt  <  |n2|  <  l2 

0  otherwise 

and  where  p\  >  0,  p2  <  0,  and  1  <  l\  <  /2.  Generalizations  to  Hamiltonians  that  include  a 
quadratic  penalty  on  deviations  from  a  desired  valence  number  are  also  straightforward. 

To  this  point  in  the  present  paper  and  throughout  [1],  this  model  and  estimator  have  been 
motivated  from  the  perspective  of  atomic  resolution  X-ray  crystallography  where  the  lattice 
variables  are  the  presence  or  absence  of  an  atom  at  that  location.  However,  there  are  low 
resolution  structures  of  large  molecules  (e.g.,  proteins)  that  might  also  be  addressed.  In  the 
low  resolution  protein  setting,  one  is  interested  in  locating  the  path  in  three  dimensions  of  a 
solid  “tube”  of  high  electron  density  that  follows  the  polypeptide  backbone  in  a  background 
of  relatively  low  electron  density.  The  new  interpretation  of  the  lattice  variables  would  then 
be  that  0  corresponds  to  background  and  1  corresponds  to  polypeptide  backbone,  which 
is  approximated  to  have  only  one  value  of  electron  density.  The  purpose  of  the  a  priori 
Hamiltonian  H* pnon  would  then  be  quite  different.  Specifically,  it  would  seek  to  make  the 
occupied  sites  (the  l’s)  form  a  connected  solid  tube  rather  than  scattering  through  space. 
Therefore,  it  would  more  resemble  a  ferromagnetic  Ising  model  Hamiltonian  than  the  bond- 
length  Hamiltonian  described  in  the  previous  paragraphs.  After  this  change  in  interpretation 
of  the  lattice  variables  and  change  in  the  constants  in  the  a  priori  Hamiltonian,  the  remainder 
of  the  calculation  in  [1]  would  be  unaltered. 

3  The  Choice  of  ’ip 

The  symmetry  breaking  portion  of  the  Hamiltonian,  Ha'h’,  contains  the  parameters  ipn  and 
for  any  choice  of  ijjn  an  approximation  to  the  Bayesian  conditional  mean  estimator  was 
computed  in  [1].  These  degrees  of  freedom  are  used  to  design  an  adaptive  estimator. 
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As  described  in  the  Introduction,  the  calculations  of  [1]  contain  two  approximations, 
namely  the  spherical  and  asymptotic  approximations.  The  asymptotic  approximation  is  of 
less  concern  because  it  can  be  improved  on  by  taking  additional  terms  in  the  asymptotic 
series  and  because  the  crystallography  problem  is  in  fact  a  small-noise  problem.  However, 
the  spherical  approximation,  by  relaxing  the  atomicity  of  the  electron  density  which  has 
been  central  to  successful  direct  methods  in  crystallography,  is  of  more  concern.  In  addi¬ 
tion,  natural  improvements  on  the  spherical  approximation  are  difficult  to  compute  because 
they  couple  the  Fourier  coefficients.  Therefore,  the  adaptive  estimation  scheme  focuses  on 
decreasing  the  effects  of  the  spherical  approximation. 

The  idea  is  to  choose  a  new  ip  for  each  data  set  such  that  the  choice  minimizes  a  cost 
function  C .  C  is  a  function  both  of  ipn  and  of  the  unthresholded  approximations  mn  to 
the  conditional  expectations  E(<pn\{y})  that  result  from  using  ipn  in  the  estimator.  The 
cost  function  is  a  linear  combination  C  —  71  Cj  -f  72C2  +  73C3  of  three  individual  costs.  C\ 
attempts  to  improve  on  the  spherical  model  by  penalizing  a  manifestation  of  the  spherical 
model’s  relaxation  of  the  0-1  constraints  on  the  lattice  variables.  C2  attempts  to  control 
the  energy  in  ipn  so  that  it  neither  vanishes  and  thereby  fails  to  break  the  symmetries  nor 
dominates  the  Hamiltonian.  C3  attempts  to  select  an  estimator  that  is  confident  of  its 
answer,  that  is,  an  estimator  where  mn  are  near  0  or  1. 

For  binary  lattice  variables  the  conditional  expectations  are  constrained  to  [0, 1].  The  es¬ 
timates  computed  using  the  spherical  approximation  routinely  violate  this  condition.  There¬ 
fore,  if  mn  is  the  computed  approximation  to  E((pn\{y}),  ip  is  chosen  to  minimize  a  cost 
function  that  penalizes  mn  £  [0, 1].  The  work  described  here  uses  the  simple  cost  function 


Ci 


<7l(x) 


'  (ar  —  l)2  if  x  >  1 
<  0  if  0  <  x  <  1  • 


'  x2  if  x  <  0 

At  the  same  time,  it  is  undesirable  for  ip  to  vanish  because  of  its  role  in  symmetry  breaking, 
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or  to  grow  too  large  and  dominate  the  estimator.  Therefore,  deviations  in  energy  from  a 
desired  energy  Eq  are  penalized  with  the  cost  function 

C2  =  m\i-Ea)2 

where  ||a:||p  =  (En3^)1^-  Choice  of  Eo  is  discussed  in  the  following  section. 

Finally,  among  all  the  estimators  parameterized  by  ip,  it  is  desirable  to  choose  a  confident 
estimator,  that  is,  an  estimator  for  which  the  resulting  mn  are  near  0  and  1.  Therefore 
deviations  in  mn  from  0  and  1  are  penalized  by  using  the  cost  function 

C3  = 

(  0  if  x  >  1 

03(2:)  =  <  (x  —  l)2x2  if  0  <  x  <  1  • 

■0  if  x  <  0 

(I  prefer  to  penalize  mn  >  1  and  mn  <  0  through  C\  only). 

The  costs  Ci  and  C3  depend  on  mn,  the  approximation  to  the  expectation  E(<pn\{y}), 
which  in  turn  is  a  function  of  ip.  When  optimizing  ip  the  mn  are  recomputed  for  each  new 
value  of  ip. 

There  are  obviously  many  other  ways  in  which  one  might  set  ip.  One  natural  entirely 
different  category  of  methods  would  be  to  set  ip  so  that  the  resulting  Bayesian  estimator  has 
some  desirable  statistical  property.  Here  the  choice  would  be  made  once  forever  according  to 
some  ensemble  criteria  while  in  the  method  described  above  the  choice  is  made  for  each  data 
set  according  to  a  pathwise  criteria.  For  any  ip,  the  Bayesian  conditional  mean  estimator 
minimizes  the  conditional  expectation  of  the  squared  difference  between  the  actual  and 
estimated  fields.  One  might  then  choose  ip  to  minimize  higher  order  statistics  or  a  special 
discrete  class  of  errors  that  are  particularly  important  from  the  application  perspective. 
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4  Implementation  of  the  ^  Optimization 

When  computing  globally  optimal  solutions  were  not  sought.  Rather  I  have  thought  in 

terms  of  downhill  search  starting  from  an  initial  condition  V,,'c'  where  the  initial  condition  (1) 

has  a  clear  symmetry-breaking  interpretation  and  (2)  is  used  to  set  Eq  in  Ci  by  E0  =  |  \\. 

This  choice  seems  preferable  because  it  is  computationally  straightforward  and  because  it 

attempts  to  keep  some  balance  between  the  symmetry-breaking  role  of  Vb  only  implicitly 

included  in  the  optimization  criteria,  and  the  optimization  role  of  0  (since  a  simple  downhill 

optimization  will  presumably  remain  in  some  domain  of  attraction  of  the  initial  condition 

and  the  initial  condition  has  a  clear  symmetry- breaking  interpretation). 

(  0  if  n  =  0 

For  one-dimensional  problems  the  initial  condition  was  tb'nc'  =  < 

{ (L  —  n) / L  if  n  0 

This  choice  sets  Hs  h •  to  be  the  first  moment  of  the  field  (j)n  and  therefore  clearly  breaks 
translational  symmetry.  Two-dimensional  problems  have  more  types  of  symmetries  than 
one- dimensional  problems  and  an  initial  condition  that  clearly  breaks  more  than  transla¬ 
tional  symmetry  is  desirable.  In  the  work  described  here  the  initial  condition  was  'tblnc'  = 
z/>(nx;  Li,  ax)V>(n2;  L?,  a2)  where  Li  and  T2  are  the  dimensions  of  the  lattice,  ax  and  a2  are 
constants,  and 

(  0  if  n  =  0 

^(n;I,a)  =  <  .  (2) 

\(L-n)(l  +  a(L-n))/L  if  n  ±  0 

This  choice  breaks  translational  and  inversion  (equivalently,  rotations  by  ir)  symmetries.  If 
L\  =  Li  then  rotations  by  n- 1  are  also  symmetry  operators  and  these  symmetries  are  also 
broken  by  this  choice  of  ip. 

Though  the  initial  condition  for  in  two-dimensional  problems  is  a  product  of  functions 
of  individual  coordinates,  the  optimization  considered  any  two-dimensional  xp  not  restricted 
to  this  product  form.  However,  it  may  be  sufficient  and  would  certainly  save  computational 
expense  to  consider  throughout  the  optimization  only  0  with  the  product  form. 

A  very  inefficient  scheme  was  used  for  optimizing  xjj  which,  however,  results  in  an  un¬ 
constrained  optimization  problem.  The  inefficiency  stems  from  the  parameterization.  Recall 
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from  [1]  that  the  estimator  has  the  following  structure.  First,  fix  if).  Second,  compute  ap¬ 
proximations  Mk  to  _E($fc|{j/}),  the  conditional  expectations  of  the  Fourier  coefficients  of 
the  field  <^>n.  The  formulae  for  this  computation  are  in  terms  of  the  Fourier  coefficients 
of  tpn  and  the  data  {?/}.  Third,  compute  approximations  mn  to  E(<f>n\{y}),  the  conditional 
expectation  of  the  field,  by  computing  the  inverse  Fourier  series  of  Mk-  Fourth,  threshold 
the  mn  to  compute  the  estimated  field. 

The  L  real  variables  'fin  n  €  {0,  —  1}  were  used  to  parameterize  fi.  Because  each  of 

these  variables  independently  ranges  over  all  of  J R,  methods  for  unconstrained  optimization 
problems  can  be  applied.  However,  a  computational  penalty  is  paid  because  first  fin  must 
be  transformed  into  at  the  cost  of  one  Fourier  series,  and  then  Mk  must  be  transformed 
into  mn,  at  the  cost  of  a  second  Fourier  series.  If  the  parameterization  was  'I'o  €  R,  Z  T  ,  and 
| fiT |  (k  €  {1, . . . ,  ^~}),  then  the  optimization  is  no  longer  unconstrained  since  Z'lT  6  [0, 2x) 
and  I’F^I  €  [0,oo),  but  the  cost  of  one  of  the  Fourier  series  calculations  is  avoided.  Note  in 
this  regard  that  C 2  can  easily  be  computed  in  terms  of  fin  or  in  terms  of  |'Ffc|.  Also  note  that 
the  above  is  purely  a  discussion  of  efficiency  per  iteration.  The  behavior  of  the  optimization 
criteria  as  a  function  of  the  parameterization  (e.g.,  straight  versus  highly  curved  valleys)  will 
determine  the  number  of  iterations  required  and  this  aspect  has  not  been  considered  at  all. 

Furthermore,  an  inefficient  multidimensional  downhill  simplex  method  [2]  was  used  to 
solve  the  unconstrained  optimization  problem.  This  method  uses  function  evaluations  but 
no  derivatives  and  attempts  to  move  a  nondegenerate  simplex  in  RL  defined  originally  by 
L  +  1  initial  condition  vectors  downhill  by  simple  geometric  moves  (e.g.,  reflection  of  the 
vertex  with  the  highest  function  value  through  the  opposite  face  of  the  simplex  to  a  position 
with  a  lower  function  value)  until  the  simplex  arrives  at  a  minimum.  This  algorithm  was 
chosen  because  the  software  is  robust  and  was  easily  available.  As  mentioned  in  the  previous 
section,  at  each  step  mn  (the  approximation  to  E(fin\{y}))  is  recomputed  for  each  choice  of 

fi- 

Finally,  all  the  equations,  both  for  the  one-  and  two-dimensional  cases,  are  for  odd 
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dimensioned  lattices.  As  mentioned  in  [1],  this  case  was  considered  because  the  fact  that 
<j>n  is  real  constrains  its  Fourier  coefficients.  Specifically,  in  one  dimension,  it  implies 
that  §k  —  $L-k-  For  L  even  there  are  therefore  two  coefficients  that  are  guaranteed  real 
(more  in  higher  dimensions)  while  for  L  odd  there  is  one  such  coefficient  in  any  dimension. 
Therefore  the  equations  are  much  simpler  for  L  odd.  However,  L  odd  is  unfavorable  from  the 
computational  point  of  view  because  it  is  incompatible  with  using  power  of  two  Fast  Fourier 
Transforms.  In  one  dimension  the  Fourier  series  was  computed  by  explicit  summation  at 
a  cost  proportional  to  L 2  while  in  two  dimensions  the  Fourier  series  was  computed  by  a 
one- dimensional  chirp  z  transform  on  rows  and  then  on  columns.  In  light' of  the  inefficient 
parameterization,  software,  and  Fourier  series  situation,  measures  of  algorithm  speed  are  not 
reported. 

In  [1]  a  very  brief  sketch  was  given  of  an  algorithm  for  locating  the  critical  point  in 
the  asymptotics  that  is  exact,  noniterative,  and  requires  only  order  L  computation.  This 
algorithm  is  described  more  fully  in  the  Appendix.  In  the  examples  described  in  this  paper, 
there  is  always  a  single  critical  point  so  the  simple  formulae  of  [1]  apply.  In  fact,  in  the 
problems  where  I  have  investigated,  there  is  only  a  single  maximum  for  the  exponent  which 
determines  the  critical  point. 

5  One-Dimensional  Examples 

These  examples  are  included  to  emphasize  the  importance  of  the  symmetry  breaking  role  of 
$■ 

The  calculations  of  this  section  use  the  Problem  2  asymptotics  where  the  observational 
Hamiltonian  Hohs  and  a  priori  Hamiltonian  f/apnon  are  both  scaled  by  the  asymptotic  pa¬ 
rameter  and  the  ratio  between  their  scalings  is  x ■  In  the  Hamiltonian  of  Sections  1  and  2  take 
al  —  v2  =  0.1  Vfc  €  {0, . .  r  j  L  —  1},  x  =  1,  ?  =  1,  Pi  =  1,  p2  =  —2,  li  =  4,  /2  =  6,  and  L  =  27. 
For  the  adaptive  estimator  design  cost  take  either  C°  =  C\  +  C2  or  C1  =  C\  +  C2  +  C3.  Two 
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simulations  are  described  based  on  two  different  choices  of  the  underlying  field  {$}.  In  both 
cases  the  simulated  data  are  pseudo  Gaussian  random  numbers  with  variance  a2  added  to 
the  Fourier  coefficients  of  the  field.  Since  the  fields  have  either  two  or  three  occupied  sites, 
these  noise  levels  correspond  well  to  the  1  to  3  percent  errors  in  real  crystallographic  data  [3, 
p.  193]. 

f  °  if  n  0  {1,5} 

For  simulated  data  based  on  the  field  <t>2n  =  <  using  cost  C°  (respec- 

1  1  if  n  €  {1,5} 

tively,  C1)  the  approximate  conditional  expectations  are  shown  in  Figure  la  (respectively, 
Figure  2a)  and  optimized  functions  are  shown  in  Figure  lb  (respectively,  Figure  2b).  In 
both  cases  the  thresholded  estimate  will  be  exactly  correct  though  in  the  case  of  C°  (i.e., 
not  specifically  selecting  a  confident  estimator),  the  estimator  almost  fails  to  locate  the  first 
occupied  site. 

[  0  if  n  i  {1,5,9} 

For  simulated  data  based  on  the  field  <f>^  =  <  using  cost  C°  (respec- 

{ 1  if  n  €  {1,5,9} 

tively,  C1)  the  approximate  conditional  expectations  are  shown  in  Figure  3a  (respectively, 
Figure  4a)  and  optimized  ipn  functions  are  shown  in  Figure  3b  (respectively,  Figure  4b).  In 
the  case  of  C°  (i.e.,  not  specifically  selecting  a  confident  estimator),  the  estimator  fails  to 
locate  the  first  and  third  occupied  sites.  Instead  there  are  two  broad  low  peaks.  The  reason 
for  the  broad  low  peaks  is  a  failure  to  adequately  break  inversion  symmetry.  Specifically, 
the  left  (respectively,  right)  half  of  the  left  peak,  the  central  peak,  and  the  left  (respectively, 
right)  half  of  the  right  peak  corresponds  to  intervals  of  5  and  4  (respectively,  4  and  5)  corre¬ 
sponding  to  the  signal  and  the  inverted  signal.  When  C 1  is  used,  this  inversion  symmetry  is 
broken  and  only  one  of  the  signal  and  its  inversion  (specifically  its  inversion)  are  substantially 
represented  in  the  a  posteriori  expectation  and  the  thresholded  estimate  is  exactly  correct. 
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6  Estimator  Performance  Statistics  Versus  Observa¬ 


tion  Noise  Intensity 

This  section  describes  results  concerning  the  estimator  performance  statistics  versus  obser¬ 
vation  noise  intensity  for  the  exact  conditional  mean  estimator,  the  two  approximations  to 
the  conditional  mean  estimator  (that  is,  the  two  definitions  of  the  asymptotics),  and  the 
trivial  estimator  that  disregards  the  data  and  simply  takes  a  random  sample  from  the  a  pri¬ 
ori  distribution  for  its  estimate.  Comparison  of  the  performance  of  the  nontrivial  and  trivial 
estimators  for  different  performance  measures  provides  measures  of  the  signal  to  noise  ratio 
improvement  due  to  the  use  of  the  nontrivial  estimator.  To  perform  these*  calculations  the 
lattice  must  be  very  small-one-dimensional  with  L  —  17  sites-because  the  computation  is 
done  by  Monte  Carlo  methods  and  the  calculation  of  an  individual  estimate  for  the  exact 
estimator  is  done  by  exhaustive  enumeration. 

The  data  is  generated  by  first  computing  configurations  of  the  H* pnori  described  in  Sec¬ 
tion  2  and  then  computing  the  observational  transformation  (Fourier  transform  followed  by 
magnitude  squared)  and  adding  Gaussian  pseudo-random  variables.  The  combination  of 
configuration  generation,  observational  transformation,  and  additive  noise  is  denoted  the 
“forward”  problem. 

The  three  nontrivial  inverse  problems  are  matched  to  the  forward  problem,  that  is,  they 
share  the  same  values  for  the  parameters,  except  that  the  configurations  in  the  forward 
problem  are  chosen  with  q  =  0  for  the  symmetry  breaking  parameter. 

H apnon  parameters  to*  =  0,  li  =  3,  I2  =  5,  pi  =  1.5,  and  p 2  =  —0.5.  The  observational 
transformation  has  no  parameters.  The  additive  white  pseudo-random  Gaussian  noise  is 
parameterized  by  its  standard  deviation,  denoted  <7*,  which  is  constant  over  k.  A  range 
of  values  for  a  were  considered.  See  the  figures.  In  all  calculations,  there  are  no  missing 
observations. 

The  exact  conditional  mean  inverse  problem  is  matched  to  the  forward  problem  plus  the 
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addition  of  q  —  1.0. 

The  approximate  conditional  mean  inverse  problems  use  71  =  72  =  73  =  1  in  the  opti¬ 
mization  criteria  for  the  design  of  t/>.  Two  parameter  sets  are  considered  which  differ  in  the 
role  of  A  which  results  in  a  difference  in  only  the  values  for  A,  Xi  and  a.  Both  are  parameter 
sets  such  that  the  forward  and  inverse  problems  are  matched  except  for  q.  The  first  set 
of  parameters  is  matched  to  the  forward  problem  with  the  exception  of  q  —  1.0  (as  in  the 
exact  inverse  problem)  and  has  in  addition  x  —  1-0,  A  =  1.0,  and  /3  —  1.0.  The  second  set 
of  parameters  uses  A  instead  of  a2  to  control  the  accuracy  of  the  observations.  Then  x 's 
used  to  compensate  in  i/apnori  for  the  value  of  A  ^  1.  Specifically,  the  quantities  ^  and  \  A 
have  the  same  value  in  both  parameter  sets  but  in  the  first  set  A  =  1  while  dn  the  second  set 
<7  =  1.  For  Problem  2  asymptotics  both  the  first  and  second  set  of  parameters  are  used.  As 
expected,  both  calculations  yield  the  same  results.  This  fact  amounts  to  a  numerical  check 
on  the  equations1  and  software.  For  Problem  1  asymptotics  only  the  first  set  of  parameters 
are  used. 

Statistics  are  collected  on  N  —  1000  trials  for  each  value  of  a.  The  underlying  fields  {d} 
are  drawn  from  the  probability  mass  function  exp(— /3H&pT>or>)  where  Z  —  exp (—/3Hapr'ov 
using  the  Metropolis  algorithm2.  At  a  given  <7,  the  same  set  of  stochastic  realizations  of  the 
forward  problem  is  used  for  all  three  nontrivial  estimators.  Since  1000  ~  210  and  there  are 
only  217  configurations  of  the  underlying  lattice  random  variables,  the  lattice  variable  con¬ 
tribution  to  the  total  randomness  in  the  measurements  is  fairly  well  sampled.  (There  are 
actually  fewer  distinguishable  configurations  because  translations  and  inversions  are  unob¬ 
servable). 

Two  measures  of  performance  are  considered.  Both  measures  are  expectations  which  are 

xTo  see  that  the  equations  are  invariant  under  parameter  changes  such  that  and  yA  are  constant,  it  is 

helpful  to  introduce  a  new  Lagrange  variable  r'  to  replace  r  (see  [1,  Section  10])  where  r'  =  rA. 

2The  initial  condition  for  the  Metropolis  algorithm  is  independent  pseudo  random  binary  numbers  at 

each  site  with  equal  probability  of  0  and  1,  the  system  is  equilibrated  for  200000  trials,  and  then  the  result 

of  every  10000th  trial  is  used. 
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approximately  computed  by  averaging  the  results  of  the  N  trials3.  Let  $n  be  an  estimate  of 
<j)n.  Because  the  phase  of  <j>n  is  not  measured,  d>n+n0  (translation  by  no)  or  <^_n  (inversion 
through  the  origin)  are  equally  satisfactory  estimates.  Therefore  in  this  section  min  means 
a  minimization  over  a  possible  inversion  through  the  origin  and  a  translation  applied  to 
<t>n-  Let  |jx||p  =  (£n  \xn\pylp.  The  first  measure  is  the  expected  value  of  the  /2  norm  of 
the  difference  between  the  true  and  reconstructed  signals  after  a  possible  translation  and 
reflection  in  order  to  achieve  the  best  match,4  i.e.,  E(l2)  =  2?min||<£  -  <j> ||2.  Let 
be  the  minimum  number  of  lattice  sites  where  (f>n  <^n  and  the  minimum  is  taken  over 
a  possible  inversion  through  the  origin  and  a  translation  of  4>.  Note  that  'min  ||d>  —  <p||2  = 
min  1 1 —  (^Ui  =  <j>).  Therefore  the  performance  results  for  mean  squared  error,  mean 

absolute  error,  and  mean  number  of  lattice  site  differences  are  all  the  same.  The  variability 
of  performance  around  the  average  performance  is  also  important.  Therefore,  for  average 
performance  measured  in  the  sense  of  E(l2),  the  variability  of  performance  in  the  sense  of 
the  standard  deviation  SD(/2)  is  also  computed,  i.e.,  SD(/2)  =  yfyar  min  || 6  —  d>||2. 

The  second  measure,  denoted  /per feet,  is  the  probability  of  an  error-free  estimate,  i.e., 

4>n  —  $>n  for  all  n,  again  modulo  inversion  and  translation.  That  is,  /perfect  =  Pr(p(/,  d>)  = 

(l  if  i  —  j 

°)  =  E8Mj), o  where  =  n 

Statistics  on  the  trivial  estimator  are  computed  in  the  following  fashion.  For  the  previous 
computations,  13  sets  of  1000  random  samples  from  the  a  priori  distribution  ( q  =  0)  were 
computed.  For  this  computation  take  all  unordered  pairs  of  not  equal  sets  (78  total),  label 
one  as  truth  and  one  as  estimate  (the  labeling  does  not  matter  since  the  cost  functions  are 
invariant  under  exchange  of  truth  and  estimate),  compute  performance  statistics  for  that 

pair,  and  finally  average  the  performance  statistics  over  all  pairs  considered.  The  results  are 
3Weighting  by  the  probability  mass  function  is  not  necessary  since  the  configurations  are  drawn  from  the 
probability  mass  function. 

4Recall  from  [1]  that  the  conditional  mean  estimator  is  optimal  for  the  cost  function  \\<p  —  <j> ||?.  That  is. 
among  all  estimators  0({y}),  the  conditional  mean  estimator  is  the  estimator  that  minimizes  E{\\4>— <?||3|{y}). 
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versus 


^perfect  =  74.3077,  E(l2)  =  1.4792,  and  SD(/2)  =  0.5346. 

Figures  5  and  6  show  plots  of  nperfect  =  /perfect-^  (Figure  5)  and  E(l2)  (Figure  6) 
a.  On  both  plots  the  three  cr-dependent  traces  plotted  with  solid  lines  show  performance  for 
the  exact  conditional  mean  estimator  (the  best  performance),  the  Problem  2  asymptotics 
approximate  estimator,  and  the  Problem  1  asymptotics  approximate  estimator  while  the  a- 
independent  trace  shows  performance  for  the  trivial  estimator  that  disregards  the  data.  The 
traces  for  the  approximate  estimators  only  differ  significantly  at  high  a  where  the  Problem  2 
estimator  consistently  outperforms  the  Problem  1  estimator.  On  Figure  6  the  traces  plotted 
with  dotted  lines  show  plus/minus  one  sample  standard  deviation  (i.e.y  SD(/2))  relative 
to  E(l2)  for  the  three  nontrivial  estimators.  The  reason  that  the  plots  xlo  not  have  the 
two  highest  a  data  points  for  the  exact  computation  is  that  the  number  of  probabilistically 
important  configurations  increases  as  a  increases.  The  running  time  of  the  current  algorithm 
for  the  exact  computation  is  sensitive  to  this  number  and  becomes  prohibitively  long. 

The  maximal  signal  possible  at  any  k  in  this  problem  is  L  =  17.  Therefore,  ignoring  the 
constraints  placed  on  the  lattice  configuration  by  the  a  priori  distribution,  the  large  a  region 
reaches  into  the  realm  of  rather  noisy  data.  This  is  born  out  by  the  fact  that  the  exact  and 
trivial  (observation  independent)  estimators  provide  rather  similar  performance  by  a  =  6. 
Similarly,  the  small  a  region  represents  rather  clean  data.  The  general  character  of  the 
performance  is  that  at  low  and  high  a  all  three  nontrivial  estimators  provide  rather  similar 
performance  while  in  the  transition  zone  the  exact  estimator  provides  good  performance  to 
slightly  higher  a  than  the  approximate  estimators.  These  general  characteristics  are  common 
in  exact  versus  approximate  nonlinear  filters,  e.g.,  in  phase-locked  loops  [4,  Chapter  6].  Note 
that  at  high  a  with  respect  to  the  nperfect  performance  measure  the  exact  estimator  performs 
less  well  than  the  approximate  estimators,  reflecting  the  fact  that  the  exact  conditional  mean 
estimator  is  not  optimal  for  the  nperfect  performance  criteria. 

The  fact  that  the  Problem  2  estimator  outperforms  the  Problem  1  estimator  at  high  a  is 
expected.  Recall  that  the  Problem  2  estimator  weights  the  a  priori  Hamiltonian  //apnon  by 
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the  asymptotic  parameter  A  while  the  Problem  1  estimator  does  not.  Since  the  estimators 
are  computed  in  the  A  — ►  oo  limit  (even  though  they  are  used  at  finite  A),  the  Problem  2 
formulation  relies  more  on  a  priori  information  and  at  high  a  a  priori  information  is  the  only 
available  information. 

In  Figure  6  the  mean  plus/minus  one  standard  deviation  lines  are  quite  distant  from 
the  mean.  One  of  the  reasons  for  this  is  the  discrete  nature  of  the  <j>  estimates.  Since 
min  \\<f>  —  ,  i j>)  and  fi  takes  on  only  small  integer  values,  the  standard  deviation  of 

min  ||^  —  <^||2  is  larger  than  might  be  expected  because  all  deviations  from  mean  performance 
must  be  at  least  a  certain  minimal  distance  away.  In  the  Table  a  complete  set  of  histograms 
as  <7  varies  are  shown  for  the  number  of  trials  (out  of  N  total  trials)  in  which  /i(o,  <p )  takes 
each  of  its  observed  values  when  (j>  is  computed  by  the  exact  conditional  mean  estimator. 

At  high  cr,  all  three  nontrivial  estimators  give  estimates  with  higher  values  of  E(l 3) 
than  the  trivial  estimator  that  disregards  the  data.  I  attribute  this  to  the  q  =  1  used  to 
break  the  symmetries  in  the  nontrivial  estimators,  which  is  not  equal  to  the  actual  value  of 
q  =  0  used  both  to  select  the  true  configurations  from  the  a  priori  distribution  and  to  select 
the  estimated  configurations  in  the  trivial  estimator.  One  could  compute  a  second  trivial 
estimator  which  again  disregards  the  data  but  draws  from  a  distribution  with  q  =  1.  I  did 
not  do  this  because  this  estimate  is  (slightly)  more  costly  to  compute  than  the  q  —  0  estimate 
and  will  have  poorer  performance  so  there  is  no  logical  reason  to  use  such  an  estimator.  In 
addition  there  may  be  a  statistical  effect  since  the  trivial  estimator’s  performance  is  based 
on  a  far  larger  number  of  trials. 

In  Figure  7  a  small  subset  of  the  truth  configurations  used  in  these  simulations  are 
shown.  While  the  small  size  and  the  one-dimensional  nature  of  the  lattice  preclude  a  really 
“molecular”  appearance,  still  the  basic  idea  is  present-the  l’s,  which  correspond  to  the  atoms, 
are  generally  neither  adjacent  nor  isolated  in  both  directions  by  great  distances.  Rather,  the 
l’s  occur  on  average  one  “covalent  bond  length”  from  each  other.  Note  that  the  boundary 
conditions  for  this  space  group  are  the  standard  toroidal  boundary  conditions  so  that  l’s 
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near  an  end  may  be  bound  to  l’s  near  the  other  end. 


7  A  Two-Dimensional  Example 

In  one-dimensional  examples  there  is  no  concept  of  bond  angle  and  therefore  there  are 
surprisingly  few  degrees  of  freedom  in  a  configuration  that  has  a  low  value  for  f7apnon.  In 
this  section  a  two  dimensional  example  is  presented  which  demonstrates  that  the  approximate 
estimators  can  work  in  situations  with  these  additional  degrees  of  freedom  which  are  only 
indirectly  constrained  by  the  bond-length  oriented  a  priori  Hamiltonian.  Also,  this  example, 
though  still  small,  has  many  more  lattice  sites  than  any  of  the  previous  examples. 

The  lattice  measures  9  by  13  sites.  The  true  configuration  has  7  occupied  sites  located 
at  positions  (5,10),  (2,8),  (3,5),  (5,3),  (4,0),  (1,0),  and  (0,3).  The  observation  noise  has 
variance  0.75  constant  over  k.  The  Ha pr,on  of  Section  2  has  parameters  pi  —  1.0,  p-_>  =  —2.0, 
li  —  2.3,  and  I2  =  4.2.  The  approximate  conditional  mean  inverse  problem  uses  Problem  2 
asymptotics  and  has  parameters  q  =  1,  x  =  T  A  =  1,  and  =  1.  The  initial  condition 
for  the  ^  optimization  has  parameters  £*1  =  |  =  0.111111  and  a?  =  =  0.0769231.  This 

choice  of  on  and  c*2  balances  the  linear  and  quadratic  terms  in  the  xjj  initial  condition  in  the 
sense  that  the  linear  term  ( L  —  n)  and  the  quadratic  term  a(L  —  n)2  evaluated  at  n  =  0  (see 
eqn.  2)  have  the  same  value  in  both  coordinate  directions. 

Results  for  the  approximate  conditional  mean  estimator  are  shown  in  Figures  8-11.  Fig¬ 
ure  8  shows  the  initial  condition  for  ^  as  a  smooth  surface  in  three  dimensions  above  the 
two-dimensional  lattice  array.  Figure  9  shows  the  optimized  ip  in  the  same  format  as  Figure  8. 
Note,  as  expected  in  light  of  the  simple  optimization  scheme,  that  the  initial  and  terminal 
•i j}  are  qualitatively  quite  similar.  Therefore  the  optimization,  which  does  not  explicitly  in¬ 
clude  a  measure  of  symmetry-breaking  performance,  has  probably  not  greatly  altered  the 
symmetry  breaking  built  into  the  initial  condition  0.  Figure  10  shows  the  true  4>  field  and 
the  estimated  <j>  field.  Since  these  are  binary  fields,  they  are  shown  as  patterns  of  occupied 
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sites  rather  than  as  projections  of  surfaces.  The  estimate  is  exactly  correct,  though  shifted 
from  the  true  configuration  by  an  inversion  through  the  origin  followed  by  a  translation  by 
the  vector  (5,11).  Better  performance  cannot  be  hoped  for  since  inversion  and  translation 
do  not  effect  the  magnitude  of  the  Fourier  transform  coefficients.  Finally,  Figure  11  shows 
the  final  (i.e.,  computed  with  the  optimized  ip)  approximation  to  E((pn\{y}),  the  conditional 
expectation  of  the  lattice  field,  in  the  same  format  as  Figures  8  and  9.  Figure  10,  the  es¬ 
timated  4>  field,  results  from  thresholding  Figure  11  at  value  |.  In  light  of  the  large  local 
maxima,  it  is  straightforward  to  anticipate  the  result  of  the  thresholding  operation. 

8  Discussion  and  Future  Directions 

In  this  paper  a  parameterization  of  iJa pnori  that  is  appropriate  for  crystallography  is  proposed; 
data-adaptive  ideas  for  the  choice  of  ip,  the  convolutional  kernel  in  the  symmetry-breaking 
Hs'h',  are  described;  and  several  numerical  examples  in  one  and  two  dimensions  on  simu¬ 
lated  data  are  given.  The  examples  included  a  tiny  one-dimensional  problem  where  it  is 
computationally  practical  to  compute  the  estimator  performance  statistics  versus  observa¬ 
tion  noise  intensity  for  the  exact  conditional  mean  estimator  by  brute  force  and  compare 
with  the  approximate  estimators.  These  examples,  though  small  in  size,  demonstrate  that 
these  approximations  based  on  the  spherical  model  and  small-noise  asymptotics  provide  rea¬ 
sonable  performance  relative  to  the  exact  conditional  mean  estimator,  which  is  extremely 
expensive  to  compute.  In  the  future  superior  optimization  schemes  are  needed,  as  discussed 
in  Section  4,  in  order  to  allow  larger  problems  to  be  considered  and  more  general  space  group 
symmetries  need  to  be  included  here  and  in  [1]. 
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10  Appendix 

This  appendix  is  based  on  the  results  of  [1,  Section  10]  and  uses  the  definitions  of  that  paper. 
In  this  appendix  an  algorithm  is  described  which  efficiently  solves  for  the  critical  point,  that 
is  the  constrained  global  minimum,  of  f3H\.  The  location  of  the  critical  point  is  needed 
in  order  to  compute  the  small-noise  asymptotic  evaluation  of  the  integrals  required  for  the 
conditional-mean  Bayesian  estimator.  In  the  process  of  computing  the  global  minimum,  all 
local  minimum  are  also  located. 

In  the  case  r  =  for  some  k  €  {1, . . . ,  —  B  I  simply  set  pj  =  0  while  in  theory 

pj.  is  undetermined  (though  nonnegative)  and  I  should  eventually  integrate  over  a  manifold 
when  computing  the  asymptotics  of  the  integrals.  The  manifold  will  not  be  all  of  pj.  >  0 
because  if  pj.  is  too  large,  then  the  constraint  equation  will  not  have  a  real  root  for  p0. 

There  are  three  straightforward  initialization  steps  followed  by  a  loop  over  at  most  + 1 
elements 
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Initialization  (1).  Define 


.  ck  —  l^(h,2a  +  h,2b)  k  G  {1,  •  •  • ,  — - — }. 

Initialization  (2).  Defined,  G  {1  G  { 1 , . . . ,  ^y^ }  a  permutation  of  { 1 , . . . ,  } 

such  that  ckl  <  ch  <  ...  <  c^.  Define  jmax  =  max{j|fc,-  G  {1,...,^}  -  B),  that  is, 
CfcWx  is  the  largest  c  for  which  no  observation  was  taken  among  frequencies  k  ^  0.  If  {j\kj  G 
{1  -  B)  =  %,  then  set  jmax  =  0. 

Initialization  (3).  Define  Ij  C  R  j  G  {1, . . . ,  ^y^  4- 1}  by 

'(-oo,cfcl)  if  J  =  1 

Ij  =  <  if  J  6  {2,.,.,^}  . 

[cfcL_j,oo)  if  j  =  yy  +  1 

~TT 

Loop  (1).  Fix  j*  G  {imax  +  1, . . . ,  yp  +  1}.  The  algorithm  will  eventually  consider  every 
element  of  this  set. 

Loop  (2).  Assume  r  G  Ij-. 

Loop  (3).  By  Loop  (2),  if  kj  G  B  then 

if  j  >  j~ 
l  0  if  j  <  j“ 

while  if  kj  G  {1, . . . ,  ^y1}  —  B  then 

Pk}  =  0. 

Note  that  these  are  not  yet  computable  because  r  is  not  yet  known. 

Loop  (4).  Use  the  constraint  condition  and  the  gradient  condition  at  k  =  0  to  solve  for 
po  and  r.  Specifically,  define 
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where  if  j*  =  — +  1  then  {j*,  =  0  and  the  sums  are  zero.  Both  sums  are 

computable  at  this  point.  The  constraint  condition  is 


-Po  ~  Po  +  Mj*,B)  -  rdi(j*,  B)  =  0 


(3) 


and  the  k  =  0  component  of  the  gradient  condition  is 

2 

—  bo, lb  ~  2(£>o,2a  +  bo,2b)pO  +  460l4aPo  +  r(— /?0  “  1)  =  0. 


(4) 


Assume  B )  =  0.  Then  eqn.  3  simplifies  to 


-£po  ~  po  +  Mr,  B)  =  0 

which  determines  the  p0  roots  and  there  are  exactly  two  possibly  equal  choices.  If  the  roots 
are  complex  conjugates  then  there  are  no  feasible  solutions.  For  each  real  root,  if  the  root  is 
then  r  for  that  root  is  undetermined.  Otherwise  eqn.  4  specifies  that 

—bo,  16  —  2(60,20  +  bo,2b)po  +  46oi4a/?o 

T  =  - 5 - • 

1-lPo 

Assume  B)  7^  0.  Then  eqn.  3  can  be  solved  for  r  giving 

ZPo  ~  Po  +  Mj“,B) 

T  =  — 37IF75) - '  (j) 


Use  of  this  result  in  eqn.  4  gives 

Po  (j2  46o,4adi(i*,  B)  j  +  pi 

+Po  2(6o,2a  +  bo,2b)di(j*,  B)  +  —  do(j*,  B)  +  1  j 
+  {~bo,ibdi B)  -  d0{j *,  B))  =  0 


which  determines  the  po  roots  and  there  are  exactly  three  possibly  equal  choices.  One  choice 
is  sure  to  be  real  while  a  pair  may  be  complex  conjugates  and  thus  not  acceptable.  Then, 
for  each  real  root,  eqn.  5  specifies  the  corresponding  r. 

If  po  £  R  or  r  ^  then  discard  that  solution.  For  each  retained  solution  po  r.  use  the 
equations  of  Loop  (3)  to  compute  p *  k  €  {1, . . . ,  and  then  compute  8H\. 
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Loop  (5).  Among  the  values  of  j5H\  found  in  Step  (7)  (there  may  be  none)  and  the 
minimum  value  of  /3H\  found  so  far  from  other  j*,  save  the  minimum  value  and  the  r,  pQ, 
Pk  k  6  {1, . . . ,  for  which  the  value  occurred. 

Loop  (6).  If  all  j *  €  {jmax  +  1, . . . ,  +  1}  have  been  considered  then  the  result  from 

the  most  recent  iteration  of  Loop  (5)  is  optimal  and  the  algorithm  terminates.  Otherwise, 
return  to  Loop  (1)  and  select  a  new  value  for  j". 
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Figure  3:  Results  for  Three  Occupied 
Before  Thresholding.  Part  b:  Optimi 
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Figure  5:  Estimator  Performance  Statistics:  nperfect  versus  a  for  exact,  approximate  (Prob¬ 
lem  1  and  Problem  2  asymptotics),  and  trivial  estimators.  Recall  that  nperfect,  is  the  number 
of  trials  (out  of  iV  =  1000  total)  where  (after  translation  and  inversion  if  required)  the  re¬ 
construction  <f>  exactly  matches  the  true  o  and  cr  is  the  observation  noise  standard  deviation. 
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Figure  6:  Estimator  Performance  Statistics:  E{U)  versus  cr  for  exact,  approximate  (Prob¬ 
lem  1  and  Problem  2  asymptotics),  and  trivial  estimators.  Recall  that  Efa)  is  the  expec¬ 
tation  of  the  h  norm  of  the  difference  between  the  reconstruction  6  and  the  true  <j>  and  a 
is  the  observation  noise  standard  deviation.  The  traces  with  labels  in  parentheses  are  plus 
or  minus  one  sample  standard  deviation  relative  to  the  sample  mean  which  carries  the  same 
label  but  without  parentheses.  The  minus  one  standard  deviation  traces  are  truncated  (near 
cr  —  2.)  when  they  become  negative. 
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Figure  7:  Example  One-dimensional  Lattice  Configurations  and  the  Corresponding  Valtu 
of  the  A  Priori  Hamiltonian. 
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Figure  9:  Optimized  w. 
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Figure  10:  Four  Adjacent  Copies  of  the  Unit  Cell  for  the  2D  Example.  Occupied  sites  (i.e.. 
n  for  which  <t>n  =  1)  are  denoted  by  filled  circles.  All  other  sites  are  unoccupied.  The 
boundaries  of  each  unit  cell  are  indicated  by  thick  grid  lines.  The  idealized  chemical  bond 
locations  are  indicated. 

Part  (a):  Original  <f>n. 

Part  (b):  Reconstructed  <f>ni  which  is  Figure  11  thresholded  at  1/2. 

The  original  and  reconstructed  <p„  agree  exactly  after  an  inversion  through  the  origin  followed 
by  a  translation  by  the  vector  (5,11). 
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Table  1:  Histograms  of  at  Each  Value  of  a  for  N  =  1000  Trials  of  the  Exact  Condi¬ 

tional  Mean  Estimator.  Each  row  corresponds  to  a  different  value  of  a  which  is  listed  in  the 
first  column.  Each  column  beneath  the  label  n(4>,  <£)  is  headed  by  an  integer.  The  entries  in 
such  a  column  are  the  number  of  times  fi(<p,  <j>)  takes  that  value  among  the  N  =  1000  trials 
at  each  cr. 
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